Integrative analysis of gene expression profiles of substantia nigra identifies potential diagnosis biomarkers in Parkinson's disease

Parkinson's disease (PD) is a progressive neurodegenerative disease whose etiology is attributed to development of Lewy bodies and degeneration of dopaminergic neurons in the substantia nigra (SN). Currently, there are no definitive diagnostic indicators for PD. In this study, we aimed to identify potential diagnostic biomarkers for PD and analyzed the impact of immune cell infiltrations on disease pathogenesis. The PD expression profile data for human SN tissue, GSE7621, GSE20141, GSE20159, GSE20163 and GSE20164 were downloaded from the Gene Expression Omnibus (GEO) database for use in the training model. After normalization and merging, we identified differentially expressed genes (DEGs) using the Robust rank aggregation (RRA) analysis. Simultaneously, DEGs after batch correction were identified. Gene interactions were determined through venn Diagram analysis. Functional analyses and protein–protein interaction (PPI) networks were used to the identify hub genes, which were visualized through Cytoscape. A Lasso Cox regression model was employed to identify the potential diagnostic genes. The GSE20292 dataset was used for validation. The proportion of infiltrating immune cells in the samples were determined via the CIBERSORT method. Sixty-two DEGs were screened in this study. They were found to be enriched in nerve conduction, dopamine (DA) metabolism, and DA biosynthesis Gene Ontology (GO) terms. The PPI network and Lasso Cox regression analysis revealed seven potential diagnostic genes, namely SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1, were subsequently validated in peripheral blood samples obtained from healthy control (HC) and PD patients, as well as in the GSE20292 dataset. The results revealed the exceptional sensitivity and specificity of these genes in PD diagnosis and monitoring. Moreover, PD patients exhibited a higher number of plasma cells, compared to HC individuals. The SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1 are potential diagnostic biomarkers for PD. Our findings also reveal the essential roles of immune cell infiltration in both disease onset and trajectory.


Materials and methods
Data source and pre-processing.
Five PD datasets (GSE7621, GSE20141, GSE20159, GSE20163 and GSE20164) were downloaded from the GEO database (https:// www.ncbi.nlm.nih.gov/ geo/), merged and used as the training set, which contains 48 healthy control (HC) and 56 PD patients.The GSE20292 dataset, which consists of transcriptional analysis data for the whole SN in PD, was used for validation (Table 1).The 'sva' package (https:// bioco nduct or.org/ packa ges/ relea se/ bioc/ html/ sva.html) was used to remove batch effects from the training sets.The principal component analysis (PCA) cluster plot was used to visualize the effects of removing inter-batch differences.The workflow of this study is shown in Fig. 1.

Establishment of the PPI network and identification of hub genes.
To establish the interplay of DEGs, PPI network analysis was performed using the STRING database (https:// cn.string-db.org/).A stringent interaction score threshold of 0.4 was employed to identify the most reliable and relevant interactions among the DEGs.Genes were denoted by nodes while connections between them were represented by edges.Subsequently, the main regulatory network was constructed and visualized using Cytoscape (https:// cytos cape.org/, version 3.9.1).The cytoHubba plugin of Cytoscape was used to identify the hub genes in the PPI network.A systematic evaluation of central genes was performed using a comprehensive repertoire of ten distinct methods: MCC, DMNC, Degree, BottleNeck, EcCentricity, Closeness, MNC, Radiality, Stress, www.nature.com/scientificreports/andBetweenness.By synthesizing the results obtained from these diverse approaches, the top 39 hub genes were identified, representing a collective selection based on their high-ranking positions across the ten methods.

Identification of PD biomarkers via Lasso Cox regression analysis.
The Lasso Cox regression analysis, a penalized regression technique, is effective for predicting outcomes and has low correlations, making it suitable for selecting the most relevant features in datasets with various variables.The 'glmnet' package(https:// cran.r-project.org/ web/ packa ges/ glmnet/) was used to extract and fit the consensus DEGs expression profiles into the Lasso Cox regression model.To identify the DEGs between HC and PD patients after Lasso Cox regression analysis, a heatmap and volcano map were drawn in the training set.

Evaluation of the diagnostic model in verification set.
A heatmap and a violin map were drawn to identify the differentially expressed hub genes between HC and PD patients in the verification set.Then, the efficacy of distinguishing between HC and PD patients was tested by receiver operating characteristic (ROC) curve analysis.

Immune cell infiltration analysis.
The R package 'CIBERSORT' (https:// github.com/ Moone rss/ CIBER SORT) was used to calculate the fractions of immune infiltrating cells in the training set, which were visualized using the boxplot.The Violin Plot was used to identify statistically significant differences in immune cells between HC and PD patients (P value < 0.05).

RT-qPCR
Independent peripheral blood samples from PD patients were used to assess the practical efficacy of our diagnostic model.Clinical whole blood samples were collected from both HC and PD patients, and total DNA were extracted using the FastPure Blood DNA Isolation Mini Kit V2.The β-Actin primer pair was used as the internal control.Relative gene expressions were calculated and normalized via the Ct technique and ΔΔCt method.Clinical information for PD patients used in the RT-qPCR is detailed in Supplementary Table S1.Primer sequences are detailed in Table 2.

Statistical analysis
Data preparation, functional analysis, and modeling were performed using R software (https:// cran.r-project.org/, v4.1.3).All P values were two-sided and differences with P < 0.05 and log |FC|< 0.585 were considered significant.Cytoscape was utilized to visualize the PPI network.ROC curves analysis was derived using 'pROC' packages (https:// cran.r-project.org/ web/ packa ges/ pROC/).Intergroup comparisons were performed via the Wilcox test.

Ethical approval
In this study, we confirm that all experiments and methods were conducted strictly in accordance with relevant guidelines and regulations.The experimental protocols have been approved by the ethics committee to ensure the legality and ethical compliance of the research.Hereby declare that informed consent was obtained from all human participants, including the use of tissue samples, involved in our research study.The human tissue sample collection for this study have obtained approval from the following institution: The first affiliated hospital of guangxi medical university.The number of the approving: 2023-E258-01.
Table 2. Primer sequences used in RT-qPCR.

Results
Identification of DEGs.
Five datasets (GSE7621, GSE20141, GSE20159, GSE20163 and GSE20164) with a total of 104 samples were included as the training set.Batch correction based on the 'sva' R package was performed to diminish the potential for bias to be introduced by batch effects on subsequent analyses.Boxplot and PCA revealed the differences before and after batch correction, indicating the successful removal of batch effects (Fig. 2A-D).The R package 'RobustRankAggreg' was used to perform the RRA analysis (Fig. 3A).Batch correction was performed on the training set (Fig. 3B).A total of 80 DEGs (nine upregulated and 71 downregulated) between HC and PD patients were identified via RRA and batch correction.Based on the R package of 'venn' (https:// cran.r-project.org/ web/ packa ges/ venn/), There are 62 DEGs were detected between RRA analysis and batch correction.(Fig. 3C).In conclusion, by integrating the datasets and treating them as the training set, we were able to filter out 62 DEGs using RRA and batch correction, thereby laying the foundation for subsequent analyses.

GO and KEGG pathway functional enrichment analysis of DEGs.
The GO and KEGG enrichment analyses were performed to determine the functions of the 62 significant DEGs (Fig. 4A,B).The enriched GO terms were nerve conduction, dopamine (DA) metabolism, and DA biosynthesis.
The enriched KEGG terms included PD and the cycle of nerve conduction.Association of proteins and the top five GO and KEGG terms were visualized using a cnetplot (Fig. 4C,D).Based on our analysis, it is possible that the identified DEGs are closely associated with the nervous system and may potentially impact PD pathogenesis.

PPI network analysis and hub genes selection
The PPI networks were analyzed in the STRING online database to elucidate on protein interactions among the DEGs (Fig. 5A).Subsequently, the main regulatory network was constructed and visualized using Cytoscape Consequently, a total of 39 genes were identified (Fig. 5B).Ranked and networks of the genes for each approach were listed in Supplementary Table S1 and Supplementary Figure S1.Expression profiles of the hub genes were extracted and integrated into the Lasso Cox regression model to identify the possible biomarkers for PD.To build a model that facilitates quantification of each patient with accuracy, seven of the 39 DEGs, including SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1 were retained by application of the Lasso Cox regression model with a minimum of λ (Fig. 5C).The distribution of these genes is presented in the volcano plot (Fig. 5D).Expression levels of the above genes significantly differed between the two sets as shown in the heatmap (Fig. 5E).These findings imply that the hub genes may have pivotal functions in biological mechanisms underlying PD and are promising therapeutic targets.Immune cell infiltration analysis.
Progression of neurodegenerative processes in PD may be sustained by changes in immune cell markers that induce or worsen neuroinflammation 6,16 .Therefore, to establish the constitution of immune cells in PD samples for the purpose of obtaining the difference between HC and PD patients in relative abundance of immune cells, the immune status was evaluated.Constituent ratios for 22 infiltrating immune cell types were as shown in Fig. 6A.The PD patients exhibited higher counts of plasma cells compared with HC (Fig. 6B).The intensity of plasma cell expressions, as evidenced by the image, was closely associated with PD.
Performance of the diagnostic model in SN datasets and samples.
To assess the clinical efficacy of our diagnostic model, we assessed it using SN datasets (GSE20292).Then, we developed a heatmap and a violin plot between HC and PD patients for the seven hub genes (Fig. 7A,B).Hub gene expression levels in the dataset were calculated using the ROC curves to identify the corresponding area under the curve (AUC).In Fig. 7C, AUCs for SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1 in HC and PD patients were 0.864, 0.652, 0.863, 0.934, 0.65, 0.722, and 0.773, respectively.When the seven hub genes were combined into a single variable, their predictive value improved; the AUC for the model was 0.965 (Fig. 7D).Thus, the hub genes have a high predictive accuracy for diagnosis.

Validation of expression level of the seven hub genes in peripheral blood samples.
The RT-qPCR assay was performed to validate the efficacy of the prediction model.Relative expressions of TAC1, PDE6H, KIAA0319, AGTR1, SLC18A2, and PCDH8 in HC were significantly higher than those in PD patients, while those of AXIN1 were significantly lower in HC, relative to PD patients (Fig. 8).These results from peripheral blood samples are consistent with the GSE20292 dataset performed on SN, indicating that these genes hold promise as prospective therapeutic targets for PD patients.

Discussion
The gold standard for diagnosing PD relies on the presence of SN pars compacta degeneration and Lewy pathology, as confirmed by post-mortem pathological examination 23,24 .Pathologically, PD is a slowly progressing neurological disease that begins years before a diagnosis 25 .Diagnostic examinations that allow for clear diagnosis during the initial phases of the disease, in particular, do not exist.Therefore, there is a need to develop suitable diagnostic approaches for early diagnosis.Through bioinformatics analyses, we aimed at identifying diagnostic genes that are related to the disease.
In previous studies, we acknowledge that some datasets were utilized in prior research by Zheng et al. 26 and Kelly et al 27 .Zheng 's study conducted a genome-wide meta-analysis of gene sets in PD, identifying 10 gene sets associated with the disease.Key findings include defects in mitochondrial electron transport, glucose utilization, and glucose sensing occurring early in disease pathogenesis.In Kelly's study, DEGs were found to be associated with perturbed pathways, including mitochondrial dysfunction and oxidative stress.In our study, based on GSE7621, GSE20141, GSE20159, GSE20163, and GSE20164, RRA and batch correction were performed to identify the consensus DEGs, which were 80 in number.Among them, 71 genes were downregulated while nine genes were upregulated.Functional enrichment analysis revealed that the enriched GO terms were mainly in nerve conduction, DA metabolism, and DA biosynthesis.Pathologically, PD is associated with both neuronal dysfunction and inflammation of the central nervous system, consistent with nerve conduction disorders 28 .Dysregulated DA is more likely to play a crucial role in early onset of PD, thus, early identification of dysregulated DA should be a priority 29 .The enriched KEGG terms were also mainly in nerve conduction.Analysis of the PPI network revealed 39 hub genes, which were visualized by Cytoscape.The Lasso Cox regression model was used to assess the diagnostic genes with a high accuracy, which were SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1.CIBERSORT was used to assess immune cell infiltrations in PD.Plasma cells were found to be differentially expressed between HC and PD patients.Compared with previous studies, our study integrates multiple datasets and employs advanced methods such as RRA analysis and immune cell infiltration analysis, we believe that our work contributes to a more comprehensive understanding of the molecular mechanisms underlying PD.
SLC18A2, the vesicular monoamine transporter 2, is important in neurotransmitter transportation.It packages histamine into vesicles in preparation for neurotransmitter release from the presynaptic neuron 30 .The gene, which is important in the monoaminergic signaling pathway, has been extensively researched on.In the absence of this monoaminergic transporter, histamine immunoreactivity is significantly suppressed in neuronal cell bodies of the brain 31 .Reduced histamine metabolism in the central nervous system is a preventative measure against PD onset 32 .Further elucidation of the involved mechanisms will contribute to a better understanding of the disease.Substance P (SP) dysregulation is associated with the etiology of PD.Mice lacking endogenous SP (TAC 1-/-) exhibited greater resistance to nigral dopaminergic neurodegeneration than wild-type controls.The neuroinflammatory and dopaminergic neurodegenerative effects of SP are mediated by microglial NOX2 33 , thus, they may shed new light on PD pathophysiology.Protocadherins, which contain PCDH8, are calcium-dependent adhesion molecules that have drawn interest for their potential functions in development of neural circuits and their potential effects on neurological illnesses.Physiologically, PCDH8 is involved in development and maintenance of intrahippocampal circuits 34 .However, the association between PCDH8 and PD has yet to be established.KIAA0319 is associated with extracellular signaling pathways 35,36 .Extracellular signaling pathways and endocytosis are both necessary to control neurogenesis 37 .As a pivotal upstream gatekeeper, KIAA0319 plays crucial roles in neurogenesis by arresting cellular progression at the neural progenitor cell stage.Cell cycle progression is deregulated in PD, and key regulators of the G1/S transition checkpoint are significantly altered 38 .Moreover, the cell cycle is enriched in GO terms.Even though there is no conclusive proof that KIAA0319 is directly associated with PD, our findings shed light on the subject.PDE6H is associated with changes in circadian rhythms that are involved in aging 39 , however, the association between PDE6H and PD has not been determined.AXIN1 was overexpressed in hippocampus tissues and cells from MPTP-lesioned mice models of PD.AXIN1 suppression in PD suppressed hippocampus neuron apoptosis.AXIN1 downregulation suppresses DA neuron death via miR-128 40 .These outcomes suggest a therapeutic potential for AXIN1 in treatment of PD.AGTR1 encodes the angiotensin II type 1 receptor (AT1R), whose expressions decreases in dopaminergic neurons of the SN as PD advances 41 .In contrast, AT1R upregulation induces the release of pro-inflammatory cytokines, leading to inflammation that culminates in dopaminergic cell death and dysfunction.Blocking AT1R with its antagonist can attenuate neurotoxin-induced degeneration of dopaminergic neurons in the SN 42,43 .These findings underscore the complex nature of AGTR1's role in PD and highlight the need for further research.
The GO function analysis revealed that the DEGs were primarily enriched in nerve conduction, DA metabolism, and DA biosynthesis.Nerve conduction is divided into sensory nerve conduction and motor nerve conduction 44 .Regarding nerve conduction, Toth reported that individuals with PD exhibited slower motor conduction velocities, when compared to HC. Toth hypothesized that peripheral neuropathy in PD could be attributed to levodopa exposure and elevated levels of methylmalonic acid 45 .Thus, motor nerve transmission abnormalities maybe present in PD patients 46 .Pathologically, PD is characterized by degeneration of the nigrostriatal dopaminergic system.In 1988, Gotham et al. proposed the 'dopamine overdose' hypothesis 47 .The hypothesis, which proposes an impact on cognitive functions in PD, suggests that the medication doses required to restore DA functions in the most severely affected regions may be too high for less affected areas.According to this theory, the ventral striatum, which remains relatively intact, can become excessively stimulated when DA replacement therapy is administered.Subsequently, this overstimulation affects the limbic system, leading to impaired executive functions mediated by the limbic and orbitofrontal systems, such as learning and risk-taking.Empirical evidence from literature and clinical observations, particularly in relation to DA agonists, provides support for this theory and the emergence of impulse control disorders 48 .The enriched KEGG terms also included the PD and cycle of nerve conduction.The diagnostic gene, SLC18A2, is also associated with neurotransmitter transport, synaptic vesicle cycle, and dopaminergic synapses.In terms of neurotransmitter transport, mutations in SLC18A2 will impact the transmission of monoamine neurotransmitters, leading to a phenotype that shares characteristics with all monoamine-related disorders 49 .In terms of the relationship between SLC18A2 and DA, SLC18A2 is a vesicular monoamine transporter that is essential in DA regulation.Suppressed SLC18A2 activities might reduce DA release 50 .
Studies have reported on significantly changed B cell subpopulation structures in PD 51 .Given that plasma cells are derived from B cells, the abundance of plasma cells is associated with PD development.Plasma cells may affect PD pathogenesis by influencing the immune microenvironment.However, the relationship between plasma cells and PD has not been fully defined.Systematic studies should be performed to elucidate on possible mechanisms of plasma cells in PD.
Recognizing the practical challenges of obtaining brain tissue for routine diagnosis, we considered the more accessible and commonly used sample type, namely peripheral blood.Our intention was to validate the identified hub genes, derived from PD patients' brain tissues, in peripheral blood.This approach is motivated by the fact that obtaining brain tissue for diagnosing suspected PD patients is impractical in routine clinical settings.Peripheral blood, being a commonly used and less invasive sample, is more feasible for routine diagnostic checks.We aimed to assess whether the gene expression patterns in blood align with those in brain tissues, with the goal of reducing diagnostic complexity and improving efficiency.However, our study has some limitations.First, even though we combined the datasets due to the lower number of samples in PD, studies with bigger sample sizes should be performed to confirm our findings.Second, we utilized RT-qPCR to validate the conclusions in blood samples of PD patients.Other experimental verification methods should be used to verify our results.Nowadays, larger-scale single cell RNA sequencing analysis and multi-omics with more clinical samples is needed to further elucidate the exact mechanism of the disease.We can make it realize in PD in the further research.
In conclusion, the developed diagnostic model provides new insights for early stage PD diagnosis.Plasma cells were found to be differentially expressed between HC and PD patients.Elucidation of the genetic and immunological mechanisms that underlie the initial signs of PD will unlock new therapeutic avenues.These insights will empower clinicians to effectively intervene with innovative or repurposed anti-inflammatory and immunomodulatory treatments, with the aim of slowing down the progression of this disease.

Conclusion
The SLC18A2, TAC1, PCDH8, KIAA0319, PDE6H, AXIN1, and AGTR1 are associated with PD pathology, and are potential diagnostic markers for PD.Besides, immune cell infiltrations might play an important role in PD.These findings hold promising implications for PD diagnosis and treatment.

Figure 2 .
Figure 2. The datasets were normalized, and the batch effects removed.(A,B) Boxplot of batch effects of combined sets before and after normalization.(C, D) A two-dimensional PCA cluster plot of datasets before and after batch effects removal.

Figure 3 .
Figure 3. Identification of DEGs.(A) RRA analysis identified the DEGs.P value-based heatmap showing the top 35 upregulated and downregulated genes.(B) Heatmap of DEGs in HC and PD patients.(C) Venn diagram of DEGs between RRA analysis and batch correction.

Figure 4 .
Figure 4.The GO analysis and KEGG pathway analysis of DEGs.(A) Dot plot showing GO analysis of DEGs.(B) Dot plot showing KEGG pathway analysis of DEGs.(C) Circle graph showing the DEGs that were enriched in the top 5 GO categories of BPs.(D) Circle graph showing the DEGs that were enriched in KEGG pathway analysis.

Figure 5 .
Figure 5. PPI network and hub gene selection.(A) The top 62 hub genes in the PPI network of DEGs based on node degree.(B) Hub genes were identified by taking the interplay of the first 39 genes in the ten classification methods of cytoHubba.(C) Hub gene selection in the Lasso Cox regression model.Vertical lines were drawn at optimal values by the minimum criteria and the 1-SE criteria.(D) The hub gene landscape in the volcano map.© The heatmap of hub genes in HC and PD patients.

Figure 6 .
Figure 6.CIBERSORT tumor-infiltrating immune cell analysis.(A) The proportion of 22 immune cells infiltrating the samples.(B) A violin plot of differential abundance of infiltrating immune cells between HC and PD patients.